Overview

Dataset statistics

Number of variables14
Number of observations891221
Missing cells0
Missing cells (%)0.0%
Duplicate rows87962
Duplicate rows (%)9.9%
Total size in memory95.2 MiB
Average record size in memory112.0 B

Variable types

Numeric14

Warnings

Dataset has 87962 (9.9%) duplicate rowsDuplicates
SEMIO_DOM is highly correlated with SEMIO_KAEM and 4 other fieldsHigh correlation
SEMIO_ERL is highly correlated with SEMIO_FAM and 2 other fieldsHigh correlation
SEMIO_FAM is highly correlated with SEMIO_ERL and 3 other fieldsHigh correlation
SEMIO_KAEM is highly correlated with SEMIO_DOM and 5 other fieldsHigh correlation
SEMIO_KRIT is highly correlated with SEMIO_DOM and 4 other fieldsHigh correlation
SEMIO_KULT is highly correlated with SEMIO_DOM and 7 other fieldsHigh correlation
SEMIO_MAT is highly correlated with SEMIO_RELHigh correlation
SEMIO_PFLICHT is highly correlated with SEMIO_RAT and 2 other fieldsHigh correlation
SEMIO_RAT is highly correlated with SEMIO_PFLICHT and 1 other fieldsHigh correlation
SEMIO_REL is highly correlated with SEMIO_ERL and 5 other fieldsHigh correlation
SEMIO_SOZ is highly correlated with SEMIO_DOM and 4 other fieldsHigh correlation
SEMIO_TRADV is highly correlated with SEMIO_PFLICHT and 2 other fieldsHigh correlation
SEMIO_VERT is highly correlated with SEMIO_DOM and 4 other fieldsHigh correlation
SEMIO_DOM is highly correlated with SEMIO_KAEM and 4 other fieldsHigh correlation
SEMIO_ERL is highly correlated with SEMIO_FAM and 2 other fieldsHigh correlation
SEMIO_FAM is highly correlated with SEMIO_ERL and 3 other fieldsHigh correlation
SEMIO_KAEM is highly correlated with SEMIO_DOM and 5 other fieldsHigh correlation
SEMIO_KRIT is highly correlated with SEMIO_DOM and 4 other fieldsHigh correlation
SEMIO_KULT is highly correlated with SEMIO_DOM and 7 other fieldsHigh correlation
SEMIO_MAT is highly correlated with SEMIO_RELHigh correlation
SEMIO_PFLICHT is highly correlated with SEMIO_RAT and 2 other fieldsHigh correlation
SEMIO_RAT is highly correlated with SEMIO_PFLICHT and 1 other fieldsHigh correlation
SEMIO_REL is highly correlated with SEMIO_ERL and 5 other fieldsHigh correlation
SEMIO_SOZ is highly correlated with SEMIO_DOM and 4 other fieldsHigh correlation
SEMIO_TRADV is highly correlated with SEMIO_PFLICHT and 2 other fieldsHigh correlation
SEMIO_VERT is highly correlated with SEMIO_DOM and 4 other fieldsHigh correlation
SEMIO_DOM is highly correlated with SEMIO_KAEM and 1 other fieldsHigh correlation
SEMIO_ERL is highly correlated with SEMIO_FAM and 1 other fieldsHigh correlation
SEMIO_FAM is highly correlated with SEMIO_ERL and 2 other fieldsHigh correlation
SEMIO_KAEM is highly correlated with SEMIO_DOM and 3 other fieldsHigh correlation
SEMIO_KRIT is highly correlated with SEMIO_KAEM and 1 other fieldsHigh correlation
SEMIO_KULT is highly correlated with SEMIO_FAM and 1 other fieldsHigh correlation
SEMIO_PFLICHT is highly correlated with SEMIO_RAT and 1 other fieldsHigh correlation
SEMIO_RAT is highly correlated with SEMIO_PFLICHTHigh correlation
SEMIO_REL is highly correlated with SEMIO_ERL and 2 other fieldsHigh correlation
SEMIO_SOZ is highly correlated with SEMIO_VERTHigh correlation
SEMIO_VERT is highly correlated with SEMIO_DOM and 3 other fieldsHigh correlation
SEMIO_KULT is highly correlated with SEMIO_FAM and 12 other fieldsHigh correlation
SEMIO_FAM is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_PFLICHT is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_MAT is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_SOZ is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_KAEM is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_KRIT is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_TRADV is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_DOM is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_RAT is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_REL is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_LUST is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_ERL is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation
SEMIO_VERT is highly correlated with SEMIO_KULT and 12 other fieldsHigh correlation

Reproduction

Analysis started2021-05-17 14:08:13.455196
Analysis finished2021-05-17 14:10:05.722150
Duration1 minute and 52.27 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

SEMIO_DOM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.667550473
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.79571208
Coefficient of variation (CV)0.3847225843
Kurtosis-0.9093931002
Mean4.667550473
Median Absolute Deviation (MAD)1
Skewness-0.4131770759
Sum4159819
Variance3.224581876
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6183435
20.6%
5177889
20.0%
7161495
18.1%
4125115
14.0%
2101498
11.4%
397027
10.9%
144762
 
5.0%
ValueCountFrequency (%)
144762
 
5.0%
2101498
11.4%
397027
10.9%
4125115
14.0%
5177889
20.0%
6183435
20.6%
7161495
18.1%
ValueCountFrequency (%)
7161495
18.1%
6183435
20.6%
5177889
20.0%
4125115
14.0%
397027
10.9%
2101498
11.4%
144762
 
5.0%

SEMIO_ERL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.481404725
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.807551899
Coefficient of variation (CV)0.4033449354
Kurtosis-1.105118991
Mean4.481404725
Median Absolute Deviation (MAD)1
Skewness-0.04328868758
Sum3993922
Variance3.267243868
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
4196206
22.0%
3180824
20.3%
7179141
20.1%
6139209
15.6%
277012
 
8.6%
576133
 
8.5%
142696
 
4.8%
ValueCountFrequency (%)
142696
 
4.8%
277012
 
8.6%
3180824
20.3%
4196206
22.0%
576133
 
8.5%
6139209
15.6%
7179141
20.1%
ValueCountFrequency (%)
7179141
20.1%
6139209
15.6%
576133
 
8.5%
4196206
22.0%
3180824
20.3%
277012
 
8.6%
142696
 
4.8%

SEMIO_FAM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.272729211
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.915885028
Coefficient of variation (CV)0.4483984202
Kurtosis-1.19892824
Mean4.272729211
Median Absolute Deviation (MAD)2
Skewness-0.2058485462
Sum3807946
Variance3.67061544
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6186729
21.0%
2139562
15.7%
4135942
15.3%
5133740
15.0%
7118517
13.3%
394815
10.6%
181916
9.2%
ValueCountFrequency (%)
181916
9.2%
2139562
15.7%
394815
10.6%
4135942
15.3%
5133740
15.0%
6186729
21.0%
7118517
13.3%
ValueCountFrequency (%)
7118517
13.3%
6186729
21.0%
5133740
15.0%
4135942
15.3%
394815
10.6%
2139562
15.7%
181916
9.2%

SEMIO_KAEM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.445007467
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.852412242
Coefficient of variation (CV)0.4167399618
Kurtosis-1.236079929
Mean4.445007467
Median Absolute Deviation (MAD)2
Skewness-0.1927373604
Sum3961484
Variance3.431431115
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6206001
23.1%
3180955
20.3%
7135579
15.2%
5128501
14.4%
2114038
12.8%
478944
 
8.9%
147203
 
5.3%
ValueCountFrequency (%)
147203
 
5.3%
2114038
12.8%
3180955
20.3%
478944
 
8.9%
5128501
14.4%
6206001
23.1%
7135579
15.2%
ValueCountFrequency (%)
7135579
15.2%
6206001
23.1%
5128501
14.4%
478944
 
8.9%
3180955
20.3%
2114038
12.8%
147203
 
5.3%

SEMIO_KRIT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.76322259
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.830789358
Coefficient of variation (CV)0.3843593961
Kurtosis-0.8752505716
Mean4.76322259
Median Absolute Deviation (MAD)2
Skewness-0.3882238006
Sum4245084
Variance3.351789675
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
7219847
24.7%
5156298
17.5%
4144079
16.2%
6133049
14.9%
3129106
14.5%
154947
 
6.2%
253895
 
6.0%
ValueCountFrequency (%)
154947
 
6.2%
253895
 
6.0%
3129106
14.5%
4144079
16.2%
5156298
17.5%
6133049
14.9%
7219847
24.7%
ValueCountFrequency (%)
7219847
24.7%
6133049
14.9%
5156298
17.5%
4144079
16.2%
3129106
14.5%
253895
 
6.0%
154947
 
6.2%

SEMIO_KULT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.025013998
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.90381623
Coefficient of variation (CV)0.4729961763
Kurtosis-1.05018586
Mean4.025013998
Median Absolute Deviation (MAD)1
Skewness-0.03536077331
Sum3587177
Variance3.624516239
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
3209067
23.5%
5176282
19.8%
1128216
14.4%
7117378
13.2%
4101502
11.4%
6101286
11.4%
257490
 
6.5%
ValueCountFrequency (%)
1128216
14.4%
257490
 
6.5%
3209067
23.5%
4101502
11.4%
5176282
19.8%
6101286
11.4%
7117378
13.2%
ValueCountFrequency (%)
7117378
13.2%
6101286
11.4%
5176282
19.8%
4101502
11.4%
3209067
23.5%
257490
 
6.5%
1128216
14.4%

SEMIO_LUST
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.359086018
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median5
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.022829383
Coefficient of variation (CV)0.4640489714
Kurtosis-1.207111475
Mean4.359086018
Median Absolute Deviation (MAD)2
Skewness-0.3030834485
Sum3884909
Variance4.091838712
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5170040
19.1%
6158624
17.8%
7158234
17.8%
2114373
12.8%
1110382
12.4%
497495
10.9%
382073
9.2%
ValueCountFrequency (%)
1110382
12.4%
2114373
12.8%
382073
9.2%
497495
10.9%
5170040
19.1%
6158624
17.8%
7158234
17.8%
ValueCountFrequency (%)
7158234
17.8%
6158624
17.8%
5170040
19.1%
497495
10.9%
382073
9.2%
2114373
12.8%
1110382
12.4%

SEMIO_MAT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.001596686
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.857540107
Coefficient of variation (CV)0.4641997315
Kurtosis-1.036445759
Mean4.001596686
Median Absolute Deviation (MAD)1
Skewness0.01186754272
Sum3566307
Variance3.45045525
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5171267
19.2%
4162862
18.3%
2134549
15.1%
3123701
13.9%
7111976
12.6%
197341
10.9%
689525
10.0%
ValueCountFrequency (%)
197341
10.9%
2134549
15.1%
3123701
13.9%
4162862
18.3%
5171267
19.2%
689525
10.0%
7111976
12.6%
ValueCountFrequency (%)
7111976
12.6%
689525
10.0%
5171267
19.2%
4162862
18.3%
3123701
13.9%
2134549
15.1%
197341
10.9%

SEMIO_PFLICHT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.256075654
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.770136694
Coefficient of variation (CV)0.4159081836
Kurtosis-0.8653655223
Mean4.256075654
Median Absolute Deviation (MAD)1
Skewness-0.1694070752
Sum3793104
Variance3.133383917
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5203845
22.9%
4162117
18.2%
3133990
15.0%
7115458
13.0%
6109442
12.3%
292214
10.3%
174155
 
8.3%
ValueCountFrequency (%)
174155
 
8.3%
292214
10.3%
3133990
15.0%
4162117
18.2%
5203845
22.9%
6109442
12.3%
7115458
13.0%
ValueCountFrequency (%)
7115458
13.0%
6109442
12.3%
5203845
22.9%
4162117
18.2%
3133990
15.0%
292214
10.3%
174155
 
8.3%

SEMIO_RAT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.910139012
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.580305974
Coefficient of variation (CV)0.404155957
Kurtosis-0.3831343065
Mean3.910139012
Median Absolute Deviation (MAD)1
Skewness0.2879717072
Sum3484798
Variance2.497366972
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
4334456
37.5%
2140433
15.8%
3131994
 
14.8%
589056
 
10.0%
787024
 
9.8%
661484
 
6.9%
146774
 
5.2%
ValueCountFrequency (%)
146774
 
5.2%
2140433
15.8%
3131994
 
14.8%
4334456
37.5%
589056
 
10.0%
661484
 
6.9%
787024
 
9.8%
ValueCountFrequency (%)
787024
 
9.8%
661484
 
6.9%
589056
 
10.0%
4334456
37.5%
3131994
 
14.8%
2140433
15.8%
146774
 
5.2%

SEMIO_REL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.240609232
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.007372556
Coefficient of variation (CV)0.4733689067
Kurtosis-1.134701252
Mean4.240609232
Median Absolute Deviation (MAD)2
Skewness0.00215072809
Sum3779320
Variance4.029544578
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
7211377
23.7%
4207128
23.2%
3150801
16.9%
1108130
12.1%
579566
 
8.9%
273127
 
8.2%
661092
 
6.9%
ValueCountFrequency (%)
1108130
12.1%
273127
 
8.2%
3150801
16.9%
4207128
23.2%
579566
 
8.9%
661092
 
6.9%
7211377
23.7%
ValueCountFrequency (%)
7211377
23.7%
661092
 
6.9%
579566
 
8.9%
4207128
23.2%
3150801
16.9%
273127
 
8.2%
1108130
12.1%

SEMIO_SOZ
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.945859669
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.946564233
Coefficient of variation (CV)0.4933181603
Kurtosis-1.353534476
Mean3.945859669
Median Absolute Deviation (MAD)2
Skewness0.1789455842
Sum3516633
Variance3.789112312
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2244714
27.5%
6136205
15.3%
5121786
13.7%
3118889
13.3%
7117378
13.2%
490161
 
10.1%
162088
 
7.0%
ValueCountFrequency (%)
162088
 
7.0%
2244714
27.5%
3118889
13.3%
490161
 
10.1%
5121786
13.7%
6136205
15.3%
7117378
13.2%
ValueCountFrequency (%)
7117378
13.2%
6136205
15.3%
5121786
13.7%
490161
 
10.1%
3118889
13.3%
2244714
27.5%
162088
 
7.0%

SEMIO_TRADV
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.661784226
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.707636767
Coefficient of variation (CV)0.4663400849
Kurtosis-0.655924441
Mean3.661784226
Median Absolute Deviation (MAD)1
Skewness0.3343106362
Sum3263459
Variance2.916023328
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
3226571
25.4%
4174203
19.5%
2132657
14.9%
5117378
13.2%
196775
10.9%
776133
 
8.5%
667504
 
7.6%
ValueCountFrequency (%)
196775
10.9%
2132657
14.9%
3226571
25.4%
4174203
19.5%
5117378
13.2%
667504
 
7.6%
776133
 
8.5%
ValueCountFrequency (%)
776133
 
8.5%
667504
 
7.6%
5117378
13.2%
4174203
19.5%
3226571
25.4%
2132657
14.9%
196775
10.9%

SEMIO_VERT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.023709046
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile7
Maximum7
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.077746254
Coefficient of variation (CV)0.5163758687
Kurtosis-1.411240267
Mean4.023709046
Median Absolute Deviation (MAD)2
Skewness-0.03560142861
Sum3586014
Variance4.317029496
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2204333
22.9%
6141714
15.9%
5135205
15.2%
7134756
15.1%
4122982
13.8%
1120437
13.5%
331794
 
3.6%
ValueCountFrequency (%)
1120437
13.5%
2204333
22.9%
331794
 
3.6%
4122982
13.8%
5135205
15.2%
6141714
15.9%
7134756
15.1%
ValueCountFrequency (%)
7134756
15.1%
6141714
15.9%
5135205
15.2%
4122982
13.8%
331794
 
3.6%
2204333
22.9%
1120437
13.5%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

SEMIO_DOMSEMIO_ERLSEMIO_FAMSEMIO_KAEMSEMIO_KRITSEMIO_KULTSEMIO_LUSTSEMIO_MATSEMIO_PFLICHTSEMIO_RATSEMIO_RELSEMIO_SOZSEMIO_TRADVSEMIO_VERT
063667355547231
172444323764561
276177343343434
347154441432544
424423642424627
542444524777262
645577567775272
712721725557756
854535561124445
966177363141232

Last rows

SEMIO_DOMSEMIO_ERLSEMIO_FAMSEMIO_KAEMSEMIO_KRITSEMIO_KULTSEMIO_LUSTSEMIO_MATSEMIO_PFLICHTSEMIO_RATSEMIO_RELSEMIO_SOZSEMIO_TRADVSEMIO_VERT
89121154225566354445
89121237423676113616
89121357264255234233
89121474464223564561
89121575576567675272
89121676154313443222
89121747444475647424
89121845254533675572
89121922722735757756
89122033623654233626

Duplicate rows

Most frequently occurring

SEMIO_DOMSEMIO_ERLSEMIO_FAMSEMIO_KAEMSEMIO_KRITSEMIO_KULTSEMIO_LUSTSEMIO_MATSEMIO_PFLICHTSEMIO_RATSEMIO_RELSEMIO_SOZSEMIO_TRADVSEMIO_VERT# duplicates
534136366735554723173961
8153246355164543474110
12782336234564123173285
57986661771124412342646
56608655773275652742530
3172217217275467572331
54310644673245652612157
62546671671544442141844
8233246355464543471799
3229217227257477561299